AITopics | human pose

Collaborating Authors

human pose

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

LI-GoOuOuInpFeMrtupstut

Neural Information Processing SystemsJun-16-2026, 19:15:14 GMT

We tackle the task of recovering an animatable 3D human avatar from a single or a sparse set of images. For this task, beyond a set of images, many prior state-of-theart methods use accurate "ground-truth" camera poses and human poses as input to guide reconstruction at test-time. We show that pose-dependent reconstruction degrades results significantly if pose estimates are noisy. To overcome this, we introduce NoPo-Avatar, which reconstructs avatars solely from images, without any pose input. By removing the dependence of test-time reconstruction on human poses, NoPo-Avatar is not affected by noisy human pose estimates, making it more widely applicable.

artificial intelligence, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Sensing and Signal Processing > Image Processing (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Humans in Kitchens: ADataset for Multi-Person Human Motion Forecasting with Scene Context

Neural Information Processing SystemsApr-25-2026, 18:55:11 GMT

Forecasting human motion of multiple persons is very challenging. It requires to model the interactions between humans and the interactions with objects and the environment. For example, a person might want to make a coffee, but if the coffee machine is already occupied the person will have to wait. These complex relations between scene geometry and persons arise constantly in our daily lives, and models that wish to accurately forecast human behavior will have to take them into consideration. To facilitate research in this direction, we propose Humans in Kitchens, a large-scale multi-person human motion dataset with annotated 3D human poses, scene geometry and activities per person and frame. Our dataset consists of over 7.3h recorded data of up to 16 persons at the same time in four kitchen scenes, with more than 4M annotated human poses, represented by a parametric 3D body model. In addition, dynamic scene geometry and objects like chair or cupboard are annotated per frame. As first benchmarks, we propose two protocols for short-term and long-term human motion forecasting.

artificial intelligence, dataset, machine learning, (13 more...)

Neural Information Processing Systems

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Robots > Humanoid Robots (0.37)

Add feedback

Synthetic-to-Real Pose Estimation with Geometric Reconstruction Qiuxia Lin 1 Kerui Gu1 Linlin Y ang 2, 3 Angela Y ao 1 1

Neural Information Processing SystemsFeb-16-2026, 09:31:30 GMT

The warping estimation module W is based on an hourglass with five conv3 3 - bn - relu - pool2 2 in the encoders and five upsample2 2 - conv3 3 - bn - relu blocks in the decoders. In G, we use the Johnson architecture [ 3 ] with two down-sampling blocks, six residual-blocks and two up-sampling blocks. The design follows [ 7 ]. The inputs are the base image, displacement field, and inpainting map. It downsampled 4 and upsampled 4 to get the output, i.e. the reconstructed image.

artificial intelligence, geometric reconstruction qiuxia lin 1, video understanding, (13 more...)

Neural Information Processing Systems

Country: Asia > Singapore (0.05)

Technology: Information Technology > Artificial Intelligence > Vision > Video Understanding (0.43)

Add feedback

2052b3e0617ecb2ce9474a6feaf422b3-Paper-Datasets_and_Benchmarks.pdf

Neural Information Processing SystemsFeb-8-2026, 19:16:24 GMT

computer vision, dataset, interaction, (11 more...)

Neural Information Processing Systems

Country:

Europe > Germany > North Rhine-Westphalia (0.04)
Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Vision (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Learning Disentangled Representation for Robust Person Re-identification

Neural Information Processing SystemsDec-26-2025, 00:36:30 GMT

We address the problem of person re-identification (reID), that is, retrieving person images from a large dataset, given a query image of the person of interest. The key challenge is to learn person representations robust to intra-class variations, as different persons can have the same attribute and the same person's appearance looks different with viewpoint changes. Recent reID methods focus on learning discriminative features but robust to only a particular factor of variations (e.g., human pose) and this requires corresponding supervisory signals (e.g., pose annotations). To tackle this problem, we propose to disentangle identity-related and -unrelated features from person images. Identity-related features contain information useful for specifying a particular person (e.g.,clothing), while identity-unrelated ones hold other factors (e.g., human pose, scale changes). To this end, we introduce a new generative adversarial network, dubbed identity shuffle GAN (IS-GAN), that factorizes these features using identification labels without any auxiliary information. We also propose an identity shuffling technique to regularize the disentangled features. Experimental results demonstrate the effectiveness of IS-GAN, largely outperforming the state of the art on standard reID benchmarks including the Market-1501, CUHK03 and DukeMTMC-reID. Our code and models will be available online at the time of the publication.

learning disentangled representation, name change, robust person re-identification, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.63)

Add feedback

Humans in Kitchens: A Dataset for Multi-Person Human Motion Forecasting with Scene Context

Neural Information Processing SystemsDec-24-2025, 04:36:53 GMT

Forecasting human motion of multiple persons is very challenging. It requires to model the interactions between humans and the interactions with objects and the environment. For example, a person might want to make a coffee, but if the coffee machine is already occupied the person will haveto wait. These complex relations between scene geometry and persons ariseconstantly in our daily lives, and models that wish to accurately forecasthuman behavior will have to take them into consideration. To facilitate research in this direction, we propose Humans in Kitchens, alarge-scale multi-person human motion dataset with annotated 3D human poses, scene geometry and activities per person and frame.Our dataset consists of over 7.3h recorded data of up to 16 persons at the same time in four kitchen scenes, with more than 4M annotated human poses, represented by a parametric 3D body model. In addition, dynamic scene geometry and objects like chair or cupboard are annotated per frame. As first benchmarks, we propose two protocols for short-term and long-term human motion forecasting.

kitchen, multi-person human motion forecasting, name change, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.40)

Add feedback

Synthetic-to-Real Pose Estimation with Geometric Reconstruction Qiuxia Lin 1 Kerui Gu1 Linlin Y ang 2, 3 Angela Y ao 1 1

Neural Information Processing SystemsOct-9-2025, 04:00:29 GMT

artificial intelligence, geometric reconstruction qiuxia lin 1, video understanding, (13 more...)

Neural Information Processing Systems

Country: Asia > Singapore (0.05)

Technology: Information Technology > Artificial Intelligence > Vision > Video Understanding (0.43)

Add feedback

Chirality Nets for Human Pose Regression

Raymond Yeh, Yuan-Ting Hu, Alexander Schwing

Neural Information Processing SystemsOct-2-2025, 08:28:25 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, machine learning, pose estimation, (17 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Articulated Pose Estimation by a Graphical Model with Image Dependent Pairwise Relations

Neural Information Processing SystemsSep-30-2025, 09:27:38 GMT

We present a method for estimating articulated human pose from a single static image based on a graphical model with novel pairwise relations that make adaptive use of local image measurements. More precisely, we specify a graphical model for human pose which exploits the fact the local image measurements can be used both to detect parts (or joints) and also to predict the spatial relationships between them (Image Dependent Pairwise Relations). These spatial relationships are represented by a mixture model. We use Deep Convolutional Neural Networks (DCNNs) to learn conditional probabilities for the presence of parts and their spatial relationships within image patches. Hence our model combines the representational flexibility of graphical models with the efficiency and statistical power of DCNNs. Our method significantly outperforms the state of the art methods on the LSP and FLIC datasets and also performs very well on the Buffy dataset without any training.

articulated pose estimation, graphical model, image dependent pairwise relation, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.61)

Add feedback

Revisiting Reliability in the Reasoning-based Pose Estimation Benchmark

Kim, Junsu, Kim, Naeun, Lee, Jaeho, Park, Incheol, Han, Dongyoon, Baek, Seungryul

arXiv.org Artificial IntelligenceJul-18-2025

The reasoning-based pose estimation (RPE) benchmark has emerged as a widely adopted evaluation standard for pose-aware multimodal large language models (MLLMs). Despite its significance, we identified critical reproducibility and benchmark-quality issues that hinder fair and consistent quantitative evaluations. Most notably, the benchmark utilizes different image indices from those of the original 3DPW dataset, forcing researchers into tedious and error-prone manual matching processes to obtain accurate ground-truth (GT) annotations for quantitative metrics ( e.g ., MPJPE, P A-MPJPE). Furthermore, our analysis reveals several inherent benchmark-quality limitations, including significant image redundancy, scenario imbalance, overly simplistic poses, and ambiguous textual descriptions, collectively undermining reliable evaluations across diverse scenarios. T o alleviate manual effort and enhance reproducibility, we carefully refined the GT annotations through meticulous visual matching and publicly release these refined annotations as an open-source resource, thereby promoting consistent quantitative evaluations and facilitating future advancements in human pose-aware mul-timodal reasoning.

benchmark, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2507.13314

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision > Video Understanding (0.73)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)

Add feedback